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ABSTRACT 


This study describes a computerized item-selection 
program called PAIN that uses a pattern-analysis approach 
to select a most-valid subset of items from a set. The 
results of this study indicate that PAIN is capable of 
selecting a small subset of items which, when scored by 
pattern analysis, has greater validity than the original 
set, It appears that, as well as reducing the sizes of 
standard tests without losing predictive value, PAIN may 
also be of value in selecting biographical items of infor- 


mation for use as predictors, 
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L, INTRODUCTION 


Testing has played, and in all likelihood will continue 
to play, a major role in the classification and selection 
processes of both industry and the military. The vast a- 
mounts of time and money expended in this area warrant 
investigation of any methods which might increase the effi- 
ciency of the techniques involved or improve the end 
results. 

It was the objective of this study to investigate one 


such method, 


II, NATURE OF THE PROBLEM 


Testing has played an important role in the military 
system of classification and placement for many years. 
Basic schooling assignments and eligibility for promotion 
are just two of the more important areas that have been 
greatly influenced by testing and test interpretation, Yet, 
on many occasions the critical time element involved in 
testing and the lack of quality information available from 
tests have combined to make classification and placement a 
haphazard affair, 

The U.S. Navy has taken steps to reduce the magnitude 
of the testing problem by the development of a computer 
program called SEQUIN (SEQUential Item Nominator). As a 
result of the use of SEQUIN it has been shown not only that 





the size of a test may be reduced without loss of validity, 
but also that validity may actually be increased by using a 
Specially selected subset of questions from the original 
test. In some cases as few as seven ere from a test were 
found to provide information equal to or better than that 
provided by the complete test. This being the case, it ap- 
peared that pattern analysis of a few selected items from a 
test might be feasible, 

The problem of using pattern analysis on a test even as 
small as 30 items is one of shear size. A test of 30-item 
size yields over a billion possible patterns, The evalua- 
tion of this number of patterns is a formidable job for 
even a computer, not to mention the problem involved with 
interpretation of individual results once all the patterns 
have been evaluated, In fact, in order to establish a pre- 
dictor value for each pattern that could be encountered, at 
least a billion subjects would have had to already have 
taken the test under consideration, 

Reducing the size of a test to seven items means that 
only 128 patterns have to be analyzed, The number of pat- 
terns involved is found by raising the number 2 to the 
power indicated by the number of items in the test. A sub- 
set of seven-item size would thus be suitable for pattern 


analysis. 


The objective of this study was to devise and evaluate 
a method of selecting items from a test that would optimize 
the validity of the subset selected when scored by pattern 


analysis. 





Lil, DEVELOPMENT OF A SOLUTION 


A. SUBJECTS OF THE STUDY 

The records of approximately 2,400 U.S. Navy enlisted 
men who had attended the Electronics Technician School at 
San Diego, California, after taking the Electronics Techni- 
cian Selection Test (ETST) were used as the source data of 
this study. The validation sample consisted of the first 
1,500 subjects in the records who had completed the course 
of instruction and been assigned a final school grade. The 
cross-validation sample was composed of the next 750 sub- 
jects who met the completion and final-grade assignment 
requirements, 

The ETST is made up of three parts totaling 70 items, 
Part I consists of 20 items designed to test the subject in 
the area of mathematics. Part II is of 20-item length also 
and is related to science. Part III consists of items di- 
rected at testing knowledge in the area of electricity and 
radio and has 30 saeans iT dG. 

Each of the items on the ETST was treated as a predictor 
variable to be compared with the criterion of final school 
grade at the Electronics Technician School, 

The computer programs used in this study were written in 
the FORTRAN language and run on the IBM 360 computer at the 
U. S. Naval Postgraduate School, Monterey, California, The 


INTEGER*2 numbering convention was used where possible in 





programing to conserve core storage area, The increased 
time involved in running the program with the use of this 


convention was not considered critical for this study. 


B. DATA CONVERSION 

The program to select items for pattern analysis was de- 
veloped on the premise that all items of the set being 
considered could be expressed in the form of a "“yes-no" or 
"correct-incorrect" answer. This simplified the programing 
by allowing the item responses to be handled in a binary 
form, 

The conversion of the raw data was not suitable to a 
manual method of handling because over 168,000 responses 
required coding. The conversion was done by using the 
conversion program shown in the COMPUTER PROGRAMS sec- 
tion(p. 31). This program facilitated the handling of the 
large volume of information. Most of this program is unique 
to the situation imposed on the author by the form of the 
data available. However, the comments contained within this 
program provide a guideline to the steps required in con- 


verting data regardless of the nature. of the data. 


C. PAIN 

The author desired to develop a computer program, which 
was to be called PAIN (Pattern Analysis Item Numinator), 
that would select a subset of items from the ETST, SEQUIN 


could already select a subset of items from the ETST but in 


a way different from that proposed by PAIN. PAIN was based 





on the belief that the pattern of responses could contrib- 
ute more to the overall value of a predictor than was 
presently being obtained through the use of SEQUIN or any 
other method, To do ee a was necessary for PAIN to be 
able to assign scores to each of the possible response pat- 
terns associated with a subset of items. The score assigned 
to a pattern in pattern analysis is the mean score of all 
subjects in a sample who have that pattern. Once a correla- 
tion coefficient was determined for a given subset, it 
would then be necessary to compare this coefficient with 
that obtained through the examination of every other subset 
of the same size available from the main set. This was im- 
practical for reasons which will be explained and an 
alternate approach was necessary if PAIN was to be used, 

The number of different subsets of N items that can be 
formed by a 70-item set is expressed as the combination of 
70 items taken N at a time. This meant that to investigate 
a subset as small as three items in size would have in- 
volved the examination of 54,740 possible subsets, each of 
which contained eight patterns of response, From the infor- 
mation available concerning SEQUIN it appeared that a 
subset of seven items would be necessary, at the least, if 
improvement was desired over the methods presently avail- 
able. 

A seven-item subset would allow the 1,500 subjects of 
the validation sample to be placed in the 128 response pat- 


terns involved with an average distribution of slightly 
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less than 12 subjects per pattern. This number was felt to 
be sufficient qomeaeamne a fairly stable mean score for 
each pattern, A second advantage of using seven items in 
the subset was that it would allow eoaas comparison with 
the work of Lieutenant K. Weinberg(personal communication). 
Lieutenant Weinberg had used the same raw data to investi- 
gate the validity of the seven items from the ETST selected 
as the best predictors by SEQUIN. Unless a reasonable al- 
ternative to the examination of all possible response 
patterns was taken, however, this would have meant the in- 
vestigation of over 77 trillion patterns, a job that was 
beyond even a computer approach, This was just for the se- 
lection of the seventh item of the subset! 

In order to overcome the problem of size, the assumption 
was made that once an item had been selected as the best 
for a subset of given size, it would continue to be a part 
of any larger subset. This allowed the author to say that 
the item selected as the best item for the subset of one 
item would be a part of the subset of two items, both of 
which would be part of the subset of three items, etc. This 
same approach is used in both stepwise regression and 
SEQUIN, and would reduce the selection process for the 
seventh item to an examination of slightly over 8,000 pat- 
terns, After PAIN was operative, tests were made to 
determine the effects of the item-retention assumption on 


the overall validity of the solution, 
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To test the item-retention assumption, subsets of two 
items each were selected randomly from each of the three 
parts of the ETST to act as the two-item subset in the PAIN 
program, The program was then allowed to select the items 
for the completion of the subsets of six-item size, The va- 
lidities of these subsets were then compared with the 
validity associated with the selection, by PAIN, of all the 
items in a subset. The two forced items were selected from 
individual parts of the ETST rather than the total ETST be- 
cause the results of the unrestricted selection by PAIN 
indicated that certain sections of the ETST were more valid 
than other sections. 

PAIN operated by computing mean criterion scores for each 
pattern of responses in a given subset, assigning these 
scores to subjects having that pattern of responses, and 
correlating assigned scores with the subjects’ final school 
grades. PAIN provided the following information when runs: 

1, Validities of all subsets examined. 

2. A list of the items that form the most valid subset 
of a given size. | 

3. The validity of the most valid She of each size, 

The final form of PAIN is contained in the COMPUTER 
PROGRAMS section(p. 31). Representative run times and core 
storage areas for this program on the IBM 360 computer are 
contained in Table 2 (p. 22). Details on the roles of inm- 
portant variables and how this program can be adapted for 


general use are contained in APPENDIX A, 
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D. CROSS-VALIDATION 

That program which the author calls "cross-validation" 
is in fact a combination of two separate programs. The 
first section of the cross-validation program was written 
to obtain mean scores for patterns of responses to items 
selected from the validation sample. This was done in the 
validation program but could not be output because it was 
not known while the program was running which subset would 
eventually be wanted. Since the score for each pattern of 
response changed whenever a new item was examined, it would 
have been necessary to store all scores for each pattern in 
the computer until the best item for inclusion in the sub- 
set was found, or to print out the pattern values of all 
subsets examined, On the other hand, the process of obtain- 
ing a mean score for each pattern was relatively easy and 
quick once all of the items in the subset were known, 

The second part of the cross-validation program did in 
fact perform cross-validation, The program assigned the 
mean pattern scores from the validation sample to subjects 
having the same response patterns in the cross-validation 
sample and then correlated these scores with the final 
school grades of the 750 subjects in this sample. 

The fact that all patterns may not have been assigned 
scores in the validation sample was handled by eliminating 
subjects from the cross-validation sample who had patterns 


that had not been assigned scores, This procedure was 
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considered acceptable because of the small number(eight) of 
subjects who fell into this catagory for a seven-item sub- 
set, | 

The cross-validation program provided the following in- 
formation, given the items that form the subset: 

1. A coded identification of the pattern of responses, 
APPENDIX B explains how to construct the patterns from 
the code, 

2. The mean score for each pattern encountered in the 
validation sample, 

3. An indication of which patterns of response were not 
encountered in the validation sample, 

4, The validity of the validation sample. 

5. The validity in cross-validation, 

6. The number of subjects eliminated from the cross- 
validation sample because their patterns were not scored 
in validation, 

The cross-validation program was also used to investi- 
gate what improvement in validity was obtainable through 
the use of PAIN over a random selection of a subset of | 
items to use in pattern analysis. 

The final form of the cross-validation program is con- 
tained in the COMPUTER PROGRAMS section(p. 31). For a 
seven-item subset this program had a running time of ap- 
proximately ten seconds on the IBM 360 computer and used a 
core storage area of approximately 80K, APPENDIX A explains 


in detail how this program can be adapted for general use. 
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IV, RESULTS 


PAIN selected items 20,14,40,56,7,5, and 33 in that or- 
der as the most predictive seven-item subset of the ETST. 
The validity of the seven items was ,828 in validation for 
1,500 subjects and .778 for the cross-validation of 750 
subjects, 

By using the pattern-analysis technique to score the 
subset of items selected by PAIN, it was possible to exceed 
the validity that had previously been attached to the ETST 
as a predictor of final school grades, The Navy had deter- 
mined the predictive validity of the ETST to be 
approximately .61 using the total number of all 70 items 
correct as the predictor. A subset of as few as three items 
selected by PAIN was capable of establishing a predictive 
validity of .66 in cross-validation, 

The cross-validation results for PAIN~selected items 
was an improvement over the .72 value of validity obtained 
in cross-validation by Lieutenant Weinberg in his study of 
SEQUIN's seven best items for predicting final school 
grades. 

In the 15 cases investigated, the random selection of 
the first two items of the subset did not improve the va- 
lidity of any six-item subsets (See Table 3). The items 
selected for the six-item subset under these conditions 
consistently included items selected by PAIN when no con- 


straints were placed on the selection process. Eight 
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subsets selected only one item each that had not appeared 

in the unconstrained solution. Item number 16 was the only 
"new" item to appear in a six-item pepeet more than once, 

and it appeared in six different subsets. 

Random selection of seven items for pattern analysis 
also consistently resulted in lower validities than those 
obtained through the use of PAIN (See Table 4). 

The author attempted using PAIN to select more than 
seven items from the ETST in order to determine at what 
point, if any, the validities in validation and/or 
cross-validation would level off or decline, At the 
ten-item-subset level, which was near the program size lim- 
it imposed by the computer system's core storage capacity, 
the validity was still increasing for the validation 
sample, The cross-validation sample, on the other hand, did 
Show a decline in predictive validity at the ten-item level 
(See Figure 1, p. 17 and Table 1). 

Examination of the assigned scores associated with the 
128 patterns representing the best seven-item subset indi- 
cated that score assignments were not always directly 
related to the number of items correct in the subset, In 
some cases four correct items in the subset were assigned 
a lower score than three correct items. Also, the score as- 
Signed to getting only a certain item correct in a subset 
of one size was not always the same as the score assigned 
to getting only that item correct in a different size 


subset, 
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FIGURE 1 


CORRELATION COEFFICIENTS IN VALIDATION AND CROSS-VALIDATION 
OF 
ITEMS SELECTED BY PAIN 


Correlation 
coefficient 


1.00 
x ~- validation 


Oo - cross-validation 


090 


0/0 


e 50 . 
0 2 3 Ly 5 ? 8 9 10 
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Note: See Table 1 for exact values of validity coefficients 
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V. CONCLUSIONS 


This study has in part confirmed the value of using 
pattern analysis as a predictive technique. While a method 
such as stepwise regression would have assigned a value to 
each of the items in the subset eventually selected by 
PAIN, the pattern-analysis approach allowed for the change 
in value of each of these items when used in different com- 
binations, It appears that PAIN obtained a higher validity 
than SEQUIN or stepwise-regression techniques (MN 4112 
student projects in progress concurrent with this writing) 
because more complete use was made of the information a- 
vailable when a pattern-analysis approach to evaluation was 
used, 

PAIN and the scoring section of the cross-validation 
program have provided a predictor of Electronics Technician 
School final grades superior to any other known to the 
author at the time of this writing. By using this technique 
of item selection on other tests, a series of short, highly 
predictive tests for other areas requiring evaluation could 
be formed. A word of caution is appropriate though. The 
subjects of this study answered the questions used to form 
PAIN's seven-item subset while taking the 70-item ETST. The 
effect on the results of taking a seven-item versus 70-item 
test was not known at the time of this writing. Until it is 
determined that the shortness of the test is without ad- 
verse effect, the full implementation of testing based on 


subsets cannot proceed, 
18 





The consistent appearance of certain items in subsets 
formed with and without constraints on PAIN, coupled with 
the lower validities resulting when PAIN was constrained, 
would seem to indicate that the process of retaining items 
previously selected does not reduce the overall validity of 
a subset. Although the solution obtained by this method may 
not be a true optimal solution, it is questionable how much 
can be gained by an attempt at examining all possible solu- 
tions, 

The author would have preferred to use a much larger 
sample than that used so that a much closer examination 
could have been made of the point at which subset size in- 
creases lack value, It is probable that the decline in 
validity in cross-validation at the ten-item level experi- 
enced in this study was a result of having less than two 
Subjects available for each pattern of response. In fact, 
at the ten-item level over 13 percent of the cross- 
validation sample was unusable because of the lack of a 
scored pattern, The problem of availability and a desire on 
the author's part to avoid the possible problems associated 
with taking subjects who received final school grades based 
on differing grading systems caused the restiction on the 
size of the samples used, 


A key area for more investigation is that of selecting 


predictors based on biographical information, Preliminary 


studies by others who have used PAIN (concurrent MN 4112 
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students) indicate that this technique of selection may 
have great value as a method of analyzing biographical data 
in relation to various criteria, This would seem logical if 
one will agree that patterns of information play a more in- 
portant role in the area of biographical information than 
in the area of testing. APPENDIX C gives an explanation of 
how some biographical information can be converted into the 
binary form necessary for PAIN. APPENDIX C also contains 
examples of some of the preliminary results obtained by us- 


ing PAIN in conjunction with biographical information. 
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TABLE 1 


CORRELATION COEFFICIENTS FOR SUBSETS OF THE ETST 
IN 
VALIDATION AND CROSS-VALIDATION 


IN SUBSET VALIDATION CROSS-VALIDATION 
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TABLE 2 


PAIN PROGRAM RUNNING TIMES AND CORE STORAGE REQUIREMENTS 
FOR 
VARIOUS SIZE SUBSETS 













NUMBER Of ITEMS APPROXIMATE APPROXIMATE 
IN SUBSET RUNNING TIME CORE STORAGE REQUIRED 


Note: Figures presented are based on evaluating a 70-item 





test from a sample of 1,500 subjects, ‘Running times are for 


IBM 360 computer, 
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TABLE 3 


PAIN RESULTS WITH CONSTRAINTS PLACED 
ON THE 
FIRST TWO ITEMS OF A SUBSET 


TWO ITEMS ADDITIONAL ITEMS 
CONSTRAINED SELECTED SUBSET VALIDITY 


6, 16 
207 
1, 20 
24 28 
6st 
6, 8 


Note: Additional items selected are presented in the order 
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TABLE 4 


VALIDATION AND CROSS-VALIDATION RESULTS 
OF 
PATTERN ANALYSIS TECHNIQUE 
USED ON 
RANDOMLY SELECTED SUBSETS OF SEVEN ITEMS 





CROSS-VALIDATION 





RANDOMLY SELECTED 
ITEMS 

12 ,14,38,41 ,42,48,56 075235 .67070 

4,40,58,63,66,67,70 66504 59745 

15,23,40,48,54,58,59 67265 057414 

4510,17,26, 34,45,55 © 71356 «69899 

6,14,41,42,53,56,66 073571 66944 












Note: PAIN validation and cross-validation results for 


seven items were ,82843 and .77829 respectively. 
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APPENDIX_A 


PAIN AND CROSS-VALIDATION PROGRAM CONVERSION 
FOR GENERAL USE 


The PAIN and cross-validation programs in this study 
can be used to process any data of a binary nature simply 
by altering the contents of the DIMENSION and DATA state- 
ments and insuring that the READ statement conforms to the 
device from which the data is being read, This APPENDIX is 
a detailed check list of how the DATA and DIMENSION state- 


ments should be set up by the user, 


A. PAIN DATA STATEMENT 

1, Nl is set equal to the size of the sample being used 
in validation, 

2. N2 is set equal to one (1). The program will handle 
increasing this variable to conform to the size of the sub- 
set under consideration, . 

3. N3 is set at a value equal to or greater than the in- 
fren range of the criterion scores. If the criterion 
scores are not in an integer form, conversion most be made 
before the data are read into the program so that the ma- 
yng >. involved can be addressed. A Data Conversion program 
can be altered to do this if such a program is used. 
Examples A 3.45 criterion score can be converted to a 345 
criterion score. If conversion were not made in advance, 


the program would truncate this criterion score to 3, See 
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A? in this Appendix for further details. The alternative to 
using integer criterion values would require a degree of 
manipulation within PAIN that is unwarranted for most 
cases. 

4, N4 is set equal to two (2). The program will handle 
the increasing of this variable to conform to the number of 
patterns within a given subset size. 

5. N5 is set equal to the final number of items desired 
in the subset, 

6. N6 is set equal to the total number of items in the 
set under investigation. 

7. INDEX is equal to a value one less than the lower 
number used in determining the range of the criterion (N3). 
Example: If the criterion were student grades on a 4.0 
grading scale and the investigator did not know the actual 
value of the lowest grade in the sample, but knew the low- 
est grade was at least higher than 1.5, the value of N3 
could be set as low as 40-15, or 25, and the value of INDEX 
would be 15-1, or 14, Note that if the lowest value for a 
final grade had actually been 1.5 instead of higher than 
1.5 the conversion would have been ho-il, or 26, for the 
value of N3 and 14-1, or 13, for the value of INDEX. Al- 
though PAIN can be run without going through the process of 
assigning a value to INDEX, in many cases the core storage 
and running time of the program can be greatly reduced by 


using the INDEX variable, 
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B, CROSS-VALIDATION DATA STATEMENT 

1. In order to perform cross-validation both the vali- 
dation and cross-validation data must be read into the 
program respectively. 

2. Ni, N3, and INDEX follow the same rules as in the 
PAIN DATA statement, 

3. N2 is set equal to the number of items in the subset 
being cross-validated or for which mean pattern scores are 
desired, 

4, NY is set equal to the number of patterns associated 
with the subset size being cross-validated. This value is 
equal to 2 to the N2 power, 

5. N7 is set equal to the number of subjects in the 
cross-validation sample. Setting this value to zero (0) 
results in processing only the mean pattern scores for the 


validation sample, 


C. DIMENSION STATEMENTS - 

1, The variables in the DIMENSION statements are dimen- 
Sioned according to the comments at the beginning of PAIN 
and the cross-validation programs. 

2. Definitions of the variables involved in the 
DIMENSION statements are contained in the list of non-dummy 
variables preceding the computer programs in the COMPUTER 


PROGRAMS section(p. 31), 





APPENDIX B 
INTERPRETING PATTERN CODES 


All patterns of response were converted from binary 
to decimal form during PAIN and the Seeseee nudaHaion pro- 
grams so that matrix addresses could be used. Hence, the 
list of patterns printed as output to the cross-validation _ 
program is in decimal form, The user of this program sim- 
ply needs to subtract one from the pattern number in the 
output to obtain the actual decimal equivalent of the bi- 
nary number of the pattern referenced. For example, the 
cross-validation program assigned a mean pattern score of 
73.0 to the pattern listed as number 58, This would convert 
to the decimal equivalent 57, which yields the binary pat- 
tern 0111001. Since the items of the subset used were read 
into the cross-validation program in the order 5,7,14,20, 
33,40,56, the pattern would indicate that a "correct" or 
“yes" answer to items 7,14,20, and 56 coupled with an "in- 
correct" or "no" answer to items number 5,33, and 40 


predict a criterion score of 73.0. 
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APPENDIX C 
BIOGRAPHICAL INFORMATION AND PAIN 


To convert biographical information into the binary 
form necessary for use in PAIN requires questions to be 
formulated such that answers can be expressed in a yes-no 
OTM. 

Questions that at first do not appear to fit a yes-no 
format are already being sectioned into parts so that that 
format can be used. For example, the question, "How old are 
you?", can be handled in the following way and often is: 

How old are you? Check one box 

i under 21 Lj 

2, 21 - 30 4 

3. 31 - 40 | [] 

4, over 40. [J 
The boxes without checks can be considered as “no" answers 
in this example with the checked box a "yes", The one ques- 
tion, “How old are you?", can now be handled by PAIN as 
four separate items. If PAIN should indicate that one or 
more of these items represent good predictors, those items 
could be further sectioned for further evaluation. Other 
types of biographical information can also be handled in 
this manner, 

The author knows of at least two studies, that were be- 


ing conducted by students at the U. S. Naval Postgraduate 
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School at the time of this writing, which involved the use 
of PAIN for selecting items of a biographical nature for 
use in predicting various criteria. A study using the final 
QPA's of students in the Masters program at the U. S. Naval 
Postgraduate School as the criterion has yielded encourag- 
ing preliminary results, While samples were too small to 
justify comparison in cross-validation, PAIN provided uni- 
formly higher validities than stepwise regression in 
validation, 

The second study involved predicting drug addiction, 
This study had found a four-item subset that had a validity 
of over .60 in validation. No cross-validation results were 
available at the time of this writing, but any reasonable 
retention of validity in cross-validation could provide an 


extremely useful predictor in this field, 
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COMPUTER PROGRAMS 


LIST OF NON-DUMMY VARIABLES USED IN COMPUTER PROGRAMS 


ANS(TI) 
B(J) 


C(J) 


C1 

C2 
D(J) 
F(M,N) 


INDEX 


Ni 
N2 
N3 
N4 
N5 
N6 
N? 
PKL, J) 


correct response to item I of test 


jth subject's assigned decimal value for his 
binary pattern of responxes 


jth subject's final school grade used for the 
criterion 


sum of the criterion scores 
sum of the squares of the criterion scores 
jth subject's identification number 


the joint frequency distribution of patterms 
versus criterion scores 


a value equal to the lowest score used in 
determining range (N3) minus 1 


as used in the conversion program only, the number 
of the data card on which information is stored 


row in the "F" matrix representing number of a 
pattern in a given subset 


column in the "“F" matrix representing the 
criterion score 


the size of. the sample used in validation 

number of items in the subset being considered 
coded range of the criterion scores 

number of patterns in the subset being considered 
Size of subset desired 

total number of items in set being investigated 
the size of the sample used in cross-validation 


jth subject's binary response to item i 
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R2 


S(I) 
S1 
S2 
W(I) 
X(J) 
X1 
X2 


the correlation coefficient determined from the 
use of raw scores 


the mean pattern score for pattern i 

sum of the criterion scores for a given pattern 
sum of the subjects with a given pattern 

answer given by subject to item i of test 

jth subject‘s mean pattern score 

sum of mean pattern scores 


sum of squares of mean pattern scores 
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ITEM SELECTION PROGRAM 
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